Statistical prosodic modeling: from corpus design to parameter estimation
نویسندگان
چکیده
The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, recently created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. This paper focuses on the use of the Victoria corpus in the statistical estimation of duration and pitch models for Apple’s next-generation text-to-speech system in Macintosh OS X. Duration modeling relies primarily on the subcorpus of prosodic contexts, which is instrumental to uncover empirical evidence in favor of a piecewise linear transformation in the well-known sums-of-products approach. Pitch modeling relies primarily on the subcorpus of reiterant speech, which makes possible the optimization of superpositional pitch models with more accurate underlying smooth contours. Experimental results illustrate the improved prosodic representation resulting from these new duration and pitch models.
منابع مشابه
Modeling segmental duration in German text-to-speech synthesis
This paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has been implemented in the German version of the Bell Labs text-tospeech system [18, 12]. The construction of the duration system was made efficient by the use of an interactive statistical analysis packag...
متن کاملA Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location
Prosodic phrase structure provides important information for the understanding and naturalness of synthetic speech, and a good model of prosodic phrases has applications in both speech synthesis and speech understanding. This work describes a statistical model of an embedded hierarchy of prosodic phrase structure, motivated by results in linguistic theory. Each level of the hierarchy is modeled...
متن کاملDesign and ccollection of a corpus of polyphones and prosodic contexts for speech synthesis research and development
The design principles and collection procedures behind a speech synthesis corpus directly impact the performance of the resulting text-to-speech system. This paper describes the design and collection of the Victoria corpus, created to support speech synthesis research and development at Apple Computer. This corpus is composed of ve constituent parts, each designed to cover a speci c aspect of s...
متن کاملCreation and utilisation of the MediaTeam Emotional Speech Corpus
The MediaTeam Emotional Speech Corpus is currently the largest database of emotional speech for colloquial modern Finnish, containing simulated emotional content. The specific aim of the research is to investigate in detail the phonetic and phonological/linguistic correlates of basic or primary emotions in spoken Finnish, to develop statistical classification methods of emotional speech signals...
متن کاملApplication of statistical techniques and artificial neural network to estimate force from sEMG signals
This paper presents an application of design of experiments techniques to determine the optimized parameters of artificial neural network (ANN), which are used to estimate force from Electromyogram (sEMG) signals. The accuracy of ANN model is highly dependent on the network parameters settings. There are plenty of algorithms that are used to obtain the optimal ANN setting. However, to the best ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 9 شماره
صفحات -
تاریخ انتشار 2001